Gemma/[Gemma_2]Using_with_Ollama_Python.ipynb (419 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "j4qMNV_533ls"
},
"source": [
"##### Copyright 2024 Google LLC."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "urf-mQKk348O"
},
"outputs": [],
"source": [
"# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iS5fT77w4ZNj"
},
"source": [
"# Gemma - Run with Ollama Python library\n",
"\n",
"Author: Sitam Meur\n",
"\n",
"* GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n",
"* X: [@sitammeur](https://x.com/sitammeur)\n",
"\n",
"Description: This notebook demonstrates how you can run inference on a Gemma model using [Ollama Python library](https://github.com/ollama/ollama-python). The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.\n",
"\n",
"<table align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Using_with_Ollama_Python.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FF6vOV_74aqj"
},
"source": [
"## Setup\n",
"\n",
"### Select the Colab runtime\n",
"To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n",
"\n",
"1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
"2. Select **Change runtime type**.\n",
"3. Under **Hardware accelerator**, select **T4 GPU**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4tlnekw44gaq"
},
"source": [
"## Installation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AL7futP_4laS"
},
"source": [
"Install Ollama through the offical installation script."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MuV7cWtcAoSV"
},
"outputs": [],
"source": [
"!curl -fsSL https://ollama.com/install.sh | sh"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FpV183Rv6-1P"
},
"source": [
"Install Ollama Python library through the official Python client for Ollama."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0Mrj29SH-3OD"
},
"outputs": [],
"source": [
"!pip install -q ollama"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iNxJFvGIe48_"
},
"source": [
"## Start Ollama\n",
"\n",
"Start Ollama in background using nohup."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5CX39xKVe9UN"
},
"outputs": [],
"source": [
"!nohup ollama serve > ollama.log &"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6YfDqlyo46Rp"
},
"source": [
"## Prerequisites\n",
"\n",
"* Ollama should be installed and running. (This was already completed in previous steps.)\n",
"* Pull the gemma2 model to use with the library: `ollama pull gemma2:2b`\n",
" * See [Ollama.com](https://ollama.com/) for more information on the models available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oPU5dA1-B5Fn"
},
"outputs": [],
"source": [
"import ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NE1AWlucBza_"
},
"outputs": [],
"source": [
"ollama.pull('gemma2:2b')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KL5Kc6HaKjmF"
},
"source": [
"## Inference\n",
"\n",
"Run inference using Ollama Python library."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HN0nrhpmFUUB"
},
"source": [
"### Generate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "AC5FsDQuFZfb"
},
"outputs": [],
"source": [
"import markdown\n",
"from ollama import generate\n",
"\n",
"# Generate a response to a prompt\n",
"response = generate(\"gemma2:2b\", \"Explain the process of photosynthesis.\")\n",
"print(response[\"response\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "p8v050JkFhuY"
},
"source": [
"#### Streaming Responses\n",
"\n",
"To enable response streaming, set `stream=True`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jw8UD2v5Fms6"
},
"outputs": [],
"source": [
"# Stream the generated response\n",
"response = generate('gemma2:2b', 'Explain the process of photosynthesis.', stream=True)\n",
"\n",
"for part in response:\n",
" print(part['response'], end='', flush=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "utyVRlIYFvdm"
},
"source": [
"#### Async client\n",
"\n",
"To make asynchronous requests, use the `AsyncClient` class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JVvKQE0XF2kh"
},
"outputs": [],
"source": [
"import asyncio\n",
"import nest_asyncio\n",
"from ollama import AsyncClient\n",
"\n",
"nest_asyncio.apply()\n",
"\n",
"\n",
"async def generate():\n",
" \"\"\"\n",
" Asynchronously generates a response to a given prompt using the AsyncClient.\n",
"\n",
" This function creates an instance of AsyncClient and sends a request to generate\n",
" a response for the specified prompt. The response is then printed.\n",
" \"\"\"\n",
" # Create an instance of the AsyncClient\n",
" client = AsyncClient()\n",
"\n",
" # Send a request to generate a response to the prompt\n",
" response = await client.generate(\n",
" \"gemma2:2b\", \"Explain the process of photosynthesis.\"\n",
" )\n",
" print(response[\"response\"])\n",
"\n",
"# Run the generate function\n",
"asyncio.run(generate())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WFyi_zzWAwe7"
},
"source": [
"### Chat"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UN20MoSv_S76"
},
"outputs": [],
"source": [
"from ollama import chat\n",
"\n",
"# Start a conversation with the model\n",
"messages = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"What is keras?\",\n",
" },\n",
"]\n",
"\n",
"# Get the model's response to the message\n",
"response = chat(\"gemma2:2b\", messages=messages)\n",
"print(response[\"message\"][\"content\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HZGsL1FI7kj8"
},
"source": [
"#### Streaming Responses\n",
"\n",
"To enable response streaming, set `stream=True`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UdxAuS1C7lm_"
},
"outputs": [],
"source": [
"# Stream the chat response\n",
"stream = chat(\n",
" model=\"gemma2:2b\",\n",
" messages=[{\"role\": \"user\", \"content\": \"What is keras?\"}],\n",
" stream=True,\n",
")\n",
"\n",
"for chunk in stream:\n",
" print(chunk[\"message\"][\"content\"], end=\"\", flush=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "czWceNYOEizg"
},
"source": [
"#### Async client + Streaming\n",
"\n",
"To make asynchronous requests, use the `AsyncClient` class, and for streaming, use `stream=True`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YJmd92z1IUVl"
},
"outputs": [],
"source": [
"import asyncio\n",
"import nest_asyncio\n",
"from ollama import AsyncClient\n",
"\n",
"nest_asyncio.apply()\n",
"\n",
"\n",
"async def chat():\n",
" \"\"\"\n",
" Asynchronously sends a chat message to the specified model and prints the response.\n",
"\n",
" This function sends a message with the role \"user\" and the content \"What is keras?\"\n",
" to the model \"gemma2:2b\" using the AsyncClient's chat method. The response is then streamed.\n",
" \"\"\"\n",
" # Define the message to send to the model\n",
" message = {\"role\": \"user\", \"content\": \"What is keras?\"}\n",
"\n",
" # Send the message to the model and print the response\n",
" async for part in await AsyncClient().chat(\n",
" model=\"gemma2:2b\", messages=[message], stream=True\n",
" ):\n",
" print(part[\"message\"][\"content\"], end=\"\", flush=True)\n",
"\n",
"# Run the chat function\n",
"asyncio.run(chat())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cDr8VzvGIXFC"
},
"source": [
"## Conclusion\n",
"\n",
"Congratulations! You have successfully run inference on a Gemma model using the Ollama Python library. You can now integrate this into your Python projects."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"name": "[Gemma_2]Using_with_Ollama_Python.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}